Skip to content

Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup#4060

Merged
danielhanchen merged 6 commits into
mainfrom
fix/auto-amdgpu-asic-id-table-path
Feb 15, 2026
Merged

Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup#4060
danielhanchen merged 6 commits into
mainfrom
fix/auto-amdgpu-asic-id-table-path

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

@danielhanchen danielhanchen commented Feb 15, 2026

Summary

This PR does three related AMD startup/packaging fixes:

  1. Auto-configures AMDGPU_ASIC_ID_TABLE_PATH at Unsloth startup on ROCm environments so GPU name resolution can work without requiring sudo symlinks.
  2. Removes _filter_rocm_amdgpu_ids_fd2_noise wrappers and related fd2 filtering machinery entirely.
  3. Fixes unsloth[amd] install compatibility with uv by using a normal index-based bitsandbytes requirement.

Problem

On some AMD/ROCm setups:

  • torch.cuda.get_device_name(0) can print:
    • /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
  • GPU names fall back to generic values unless amdgpu.ids is discoverable.

Separately, uv pip install "unsloth[amd] @ git+https://github.com/unslothai/unsloth" can fail with:

  • Wheel version does not match filename for the bitsandbytes-1.33.7.preview URL (wheel metadata reports 0.49.2.dev0).

Changes

unsloth/import_fixes.py

  • Added configure_amdgpu_asic_id_table_path():
    • Honors existing valid AMDGPU_ASIC_ID_TABLE_PATH if user already set it.
    • Runs only on ROCm-like environments (reuses _is_rocm_torch_build() detection).
    • Resolves candidate amdgpu.ids paths without importing torch:
      • torch-adjacent locations discovered from importlib.util.find_spec("torch")
      • system fallbacks (/usr/share/libdrm/amdgpu.ids, etc.)
    • Sets AMDGPU_ASIC_ID_TABLE_PATH to the first existing file.
    • No-op if nothing is found.
  • Removed _filter_rocm_amdgpu_ids_fd2_noise() and associated fd2 stderr redirection helpers.
  • Removed fd2 filter wrapping from the causal_conv1d probe path.

unsloth/__init__.py

  • Calls configure_amdgpu_asic_id_table_path() early, before disable_broken_causal_conv1d() and before torch import paths.
  • Removed _filter_rocm_amdgpu_ids_fd2_noise() import and all startup with wrappers.

pyproject.toml

  • Updated amd extra to use a normal index-based requirement:
    • bitsandbytes>=0.49.1 (platform-marked)
  • Removed amd-preview extra.

Why this approach

  • No root privileges needed for AMD ids path setup.
  • Early env configuration improves ROCm GPU naming behavior.
  • Avoids fd-level stderr capture complexity.
  • Keeps amd extra robust with both pip and uv.

Validation

  • python -m py_compile unsloth/import_fixes.py unsloth/__init__.py
  • ROCm-hinted smoke check with local source:
    • CONFIG_RETURN /usr/share/libdrm/amdgpu.ids
    • ENV_VALUE /usr/share/libdrm/amdgpu.ids
  • uv install checks:
    • Before pyproject fix, reproduced malformed wheel failure on amd.
    • After pyproject fix, succeeded with:
      • uv pip install --refresh "unsloth[amd] @ git+https://github.com/unslothai/unsloth@fix/auto-amdgpu-asic-id-table-path"

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Unsloth library's compatibility and user experience on ROCm-enabled systems. It introduces an automated mechanism to correctly configure the AMDGPU_ASIC_ID_TABLE_PATH environment variable, which is crucial for accurate AMD GPU name resolution. This eliminates a common pain point where users previously encountered "file not found" errors and had to resort to manual, privileged operations to fix their GPU identification. The solution is designed to be robust, respecting existing user settings and executing early in the application lifecycle to ensure proper functionality.

Highlights

  • Automated AMDGPU_ASIC_ID_TABLE_PATH configuration: The PR introduces automatic detection and setting of the AMDGPU_ASIC_ID_TABLE_PATH environment variable during Unsloth startup for ROCm environments.
  • Resolved amdgpu.ids file not found errors: This change prevents common errors where torch.cuda.get_device_name() fails to resolve AMD GPU names due to missing amdgpu.ids paths, eliminating the need for manual sudo symlinks.
  • Prioritized existing user configurations: The new configuration logic respects and honors any valid AMDGPU_ASIC_ID_TABLE_PATH already set by the user.
  • Early execution for full impact: The configuration function is called early in the Unsloth initialization process to ensure it takes effect before other ROCm/libdrm name lookups.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • unsloth/init.py
    • Imported the new configure_amdgpu_asic_id_table_path function.
    • Called configure_amdgpu_asic_id_table_path() at an early stage of initialization.
    • Removed the configure_amdgpu_asic_id_table_path function from the global namespace after execution.
  • unsloth/import_fixes.py
    • Defined new constants _AMDGPU_ASIC_ID_TABLE_PATH_ENV and _AMDGPU_ASIC_ID_CANDIDATE_PATHS for environment variable name and common amdgpu.ids locations.
    • Added _iter_amdgpu_asic_id_table_candidates function to generate potential paths for amdgpu.ids, including torch-adjacent and system-wide locations.
    • Implemented configure_amdgpu_asic_id_table_path function, which checks for an existing valid user-set path, verifies if the environment is ROCm-like, and then iterates through candidate paths to set AMDGPU_ASIC_ID_TABLE_PATH to the first found valid file.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an auto-configuration mechanism for AMDGPU_ASIC_ID_TABLE_PATH on ROCm environments, which is a valuable improvement for GPU name resolution without requiring root privileges. The changes are well-structured, implementing a new function configure_amdgpu_asic_id_table_path in unsloth/import_fixes.py and integrating it early in unsloth/__init__.py. The approach correctly prioritizes existing user configurations and includes robust detection for ROCm environments and candidate paths. The use of del to clean up the imported function in __init__.py is a good practice for namespace management.

Comment thread unsloth/import_fixes.py
Comment on lines +1287 to +1291
try:
if configured_path.is_file():
return str(configured_path)
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The try...except Exception: pass block is overly broad. While it's good to handle potential issues with configured_path.is_file(), catching a generic Exception can mask other programming errors or unexpected issues. It's generally better to catch more specific exceptions like OSError or PermissionError that Path.is_file() might raise. If other exceptions are truly intended to be ignored, they should be logged for debugging purposes.

Suggested change
try:
if configured_path.is_file():
return str(configured_path)
except Exception:
pass
try:
if configured_path.is_file():
return str(configured_path)
except OSError as e:
logger.debug(f"Unsloth: Error checking configured AMDGPU_ASIC_ID_TABLE_PATH: {e}")

Comment thread unsloth/import_fixes.py
Comment on lines +1298 to +1307
try:
if candidate.is_file():
os.environ[_AMDGPU_ASIC_ID_TABLE_PATH_ENV] = str(candidate)
if UNSLOTH_ENABLE_LOGGING:
logger.info(
f"Unsloth: Set {_AMDGPU_ASIC_ID_TABLE_PATH_ENV}={candidate}"
)
return str(candidate)
except Exception:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, the try...except Exception: continue block is very broad. Catching a generic Exception can hide important issues that might occur during the candidate.is_file() check or when setting the environment variable. It's recommended to catch more specific exceptions (e.g., OSError for file system operations) or at least log the exception details if the intent is to ignore them and continue.

Suggested change
try:
if candidate.is_file():
os.environ[_AMDGPU_ASIC_ID_TABLE_PATH_ENV] = str(candidate)
if UNSLOTH_ENABLE_LOGGING:
logger.info(
f"Unsloth: Set {_AMDGPU_ASIC_ID_TABLE_PATH_ENV}={candidate}"
)
return str(candidate)
except Exception:
continue
try:
if candidate.is_file():
os.environ[_AMDGPU_ASIC_ID_TABLE_PATH_ENV] = str(candidate)
if UNSLOTH_ENABLE_LOGGING:
logger.info(
f"Unsloth: Set {_AMDGPU_ASIC_ID_TABLE_PATH_ENV}={candidate}"
)
return str(candidate)
except OSError as e:
logger.debug(f"Unsloth: Error checking candidate AMDGPU_ASIC_ID_TABLE_PATH '{candidate}': {e}")
continue

@danielhanchen danielhanchen merged commit ff0603c into main Feb 15, 2026
4 checks passed
@danielhanchen danielhanchen deleted the fix/auto-amdgpu-asic-id-table-path branch February 15, 2026 05:52
abiswas-realadvice pushed a commit to abiswas-realadvice/unsloth that referenced this pull request May 14, 2026
)

* Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup

* Remove ROCm fd2 amdgpu.ids noise filter wrappers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use PyPI bitsandbytes for amd extra to avoid malformed wheel URL

* Add amd-preview extra for bitsandbytes continuous wheel channel

* Keep amd extra on bitsandbytes>=0.49.1 and remove amd-preview

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant